12 research outputs found
DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores
We present our framework DKVF that enables one to quickly prototype and
evaluate new protocols for key-value stores and compare them with existing
protocols based on selected benchmarks. Due to limitations of CAP theorem, new
protocols must be developed that achieve the desired trade-off between
consistency and availability for the given application at hand. Hence, both
academic and industrial communities focus on developing new protocols that
identify a different (and hopefully better in one or more aspect) point on this
trade-off curve. While these protocols are often based on a simple intuition,
evaluating them to ensure that they indeed provide increased availability,
consistency, or performance is a tedious task. Our framework, DKVF, enables one
to quickly prototype a new protocol as well as identify how it performs
compared to existing protocols for pre-specified benchmarks. Our framework
relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We
demonstrate DKVF by implementing four existing protocols --eventual
consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the
performance of these protocols against different loading conditions. We find
that the performance is similar to our implementation of these protocols from
scratch. And, the comparison of these protocols is consistent with what has
been reported in the literature. Moreover, implementation of these protocols
was much more natural as we only needed to translate the pseudocode into Java
(and add the necessary error handling). Hence, it was possible to achieve this
in just 1-2 days per protocol. Finally, our framework is extensible. It is
possible to replace individual components in the framework (e.g., the storage
component)
Auditable Restoration of Distributed Programs
We focus on a protocol for auditable restoration of distributed systems. The
need for such protocol arises due to conflicting requirements (e.g., access to
the system should be restricted but emergency access should be provided). One
can design such systems with a tamper detection approach (based on the
intuition of "break the glass door"). However, in a distributed system, such
tampering, which are denoted as auditable events, is visible only for a single
node. This is unacceptable since the actions they take in these situations can
be different than those in the normal mode. Moreover, eventually, the auditable
event needs to be cleared so that system resumes the normal operation.
With this motivation, in this paper, we present a protocol for auditable
restoration, where any process can potentially identify an auditable event.
Whenever a new auditable event occurs, the system must reach an "auditable
state" where every process is aware of the auditable event. Only after the
system reaches an auditable state, it can begin the operation of restoration.
Although any process can observe an auditable event, we require that only
"authorized" processes can begin the task of restoration. Moreover, these
processes can begin the restoration only when the system is in an auditable
state. Our protocol is self-stabilizing and has bounded state space. It can
effectively handle the case where faults or auditable events occur during the
restoration protocol. Moreover, it can be used to provide auditable restoration
to other distributed protocol.Comment: 10 page
Ensuring Average Recovery with Adversarial Scheduler
In this paper, we focus on revising a given program so that the average recovery time in the presence of an adversarial scheduler is bounded by a given threshold lambda. Specifically, we consider the scenario where the fault (or other unexpected action) perturbs the program to a state that is outside its set of legitimate states. Starting from this state, the program executes its actions/transitions to recover to legitimate states. However, the adversarial scheduler can force the program to reach one illegitimate state that requires a longer recovery time.
To ensure that the average recovery time is less than lambda, we need to remove certain transitions/behaviors. We show that achieving this average response time while removing minimum transitions is NP-hard. In other words, there is a tradeoff between the time taken to synthesize the program and the transitions preserved to reduce the average convergence time. We present six different heuristics and evaluate this tradeoff with case studies. Finally, we note that the average convergence time considered here requires formalization of hyperproperties. Hence, this work also demonstrates feasibility of adding (certain) hyperproperties to an existing program
Automatic Addition of Fault-Tolerance in Presence of Unchangeable Environment Actions â€
We focus on the problem of adding fault-tolerance to an existing concurrent protocol in the presence of unchangeable environment actions. Such unchangeable actions occur in cases where a subset of components/processes cannot be modified since they represent third-party components or are constrained by physical laws. These actions differ from faults in that they are (1) simultaneously collaborative and disruptive, (2) essential for satisfying the specification and (3) possibly non-terminating. Hence, if these actions are modeled as faults while adding fault-tolerance, it causes existing model repair algorithms to declare failure to add fault-tolerance. We present a set of algorithms for adding stabilization and fault-tolerance for programs that run in the presence of environment actions. We prove the soundness, completeness and the complexity of our algorithms. We have implemented all of our algorithms using symbolic techniques in Java. The experimental results of our algorithms for various examples are also provided